Cross-bandwidth adaptation for ASR systems
نویسندگان
چکیده
Mismatches between application and training data greatly reduce the performance of automatic speech recognition (ASR) systems. However, collecting suitable amounts of in-domain and application-specific data for training is resource intensive and may not be feasible for resource-scarce environments. Utilising limited amounts of in-domain data and a combination of feature normalisation and acoustic model adaptation techniques has therefore found wide use in ASR systems. Various approaches have been proposed, and it is not clear when to make use of a particular approach given a specific amount of adaptation data. In this work we investigate the use of standard feature normalisation and model adaptation techniques, for the scenario where adaptation between narrowand wide-band environments must be performed. Our investigation focuses on the dependence of the adaptation data amount and various adaptation techniques by systematically varying the adaptation data amount and comparing the performance of various adaptation techniques. From this we establish a guideline which can be used by an ASR developer to choose the best adaptation technique given a size constraint on the adaptation data. In addition, we investigate the effectiveness of a novel channel normalisation technique and compare the performance with standard normalisation and adaptation techniques.
منابع مشابه
Robust ASR model adaptation by feature-based statistical data mapping
Automatic speech recognition (ASR) model adaptation is important to many real-life ASR applications due to the variability of speech. The differences of speaker, bandwidth, context, channel and et al. between speech databases of initial ASR models and application data can be major obstacles to the effectiveness of ASR models. ASR models, therefore, need to be adapted to the application environm...
متن کاملRapid Building of an ASR System for Under-Resourced Languages Based on Multilingual Unsupervised Training
This paper presents our work on rapid language adaptation of acoustic models based on multilingual cross-language bootstrapping and unsupervised training. We used Automatic Speech Recognition (ASR) systems in the six source languages English, French, German, Spanish, Bulgarian and Polish to build from scratch an ASR system for Vietnamese, an underresourced language. System building was performe...
متن کاملIntegrating MAP, marginals, and unsupervised language model adaptation
We investigate the integration of various language model adaptation approaches for a cross-genre adaptation task to improve Mandarin ASR system performance on a recently introduced new genre, broadcast conversation (BC). Various language model adaptation strategies are investigated and their efficacies are evaluated based on ASR performance, including unsupervised language model adaptation from...
متن کاملTime is Money: Why Very Rapid Adaptation Matters
Very rapid adaptation (i.e., adaptation over the range from 0 to 30 sec. of speech data) of ASR systems has great commercial importance and scientific interest. This paper describes some applications of very rapid adaptation, discusses techniques for achieving it, and argues that work in this field has implications for the overall structure of ASR systems. Finally, directions for future work ar...
متن کاملTowards spoken clinical-question answering: evaluating and adapting automatic speech-recognition systems for spoken clinical questions
OBJECTIVE To evaluate existing automatic speech-recognition (ASR) systems to measure their performance in interpreting spoken clinical questions and to adapt one ASR system to improve its performance on this task. DESIGN AND MEASUREMENTS The authors evaluated two well-known ASR systems on spoken clinical questions: Nuance Dragon (both generic and medical versions: Nuance Gen and Nuance Med) a...
متن کامل